A hybrid text mining system for chemical entity recognition and classification using dictionary look-up and pattern matching @ BeCalm challenge evaluation workshop

نویسندگان

  • Kalpana Raja
  • Sabenabanu Abdulkadhar
  • Lam C Tsoi
  • Jeyakumar Natarajan
چکیده

Chemicals as therapeutics and investigational agents receive much attention in clinical research and applications recently. However, automated approaches to recognize and categorize the chemical entities in biomedical text are challenging because of the wide varieties of morphologies and nomenclature. We present here a hybrid text mining system that combines chemical lexicon and patterns for recognition/categorization. We applied this approach to identify chemical entities from the patent abstracts of BioCreative V.5 Chemical Entity Mention Recognition (CEMP) corpus. We also FRPSDUHG WKH K\EULG DSSURDFK ZLWK WKH 3WUDGLWLRQDO ́ lexion-based method, and illustrated that the hybrid approach can achieve enhanced performance (i.e. precision, recall, and F-score) than the lexion-based method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Olelo’s named-entity recognition web service in the BeCalm TIPS task

Named-entity recognition (NER) is an important preliminary tasks in many text mining systems. However, few web services are currently freely available for use. The BeCalm TIPS challenge aims to evaluate web services for biomedical NER in terms of reliability and performance. We participated with our dictionary-based NER which is part of our Olelo question answering system. Since the start of th...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

NTTMU-SCHEMA BeCalm API in BioCreative V.5

With the emerging of new experimental techniques, there has been a remarkable increase in the amount of available biomedical data. Processing and mining large volumes of data in chemistry has now presented a challenging issue. In order to deal with the challenge, we developed SCHEMA (Spark-based CHEMicAl entity recognizer), a robust and efficient chemical entity recognition system on top of Apa...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017